AITopics | adp algorithm

d7f426ccbc6db7e235c57958c21c5dfa-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 15:15:31 GMT

algorithm, discor, function approximation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

Neural Information Processing SystemsSep-30-2025, 11:57:37 GMT

Tetris is a popular video game that has been widely used as a benchmark for various optimization techniques including approximate dynamic programming (ADP) algorithms. A close look at the literature of this game shows that while ADP algorithms, that have been (almost) entirely based on approximating the value function (value function based), have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results. This makes us conjecture that Tetris is a game in which good policies are easier to represent, and thus, learn than their corresponding value functions. So, in order to obtain a good performance with ADP, we should use ADP algorithms that search in a policy space, instead of the more traditional ones that search in a value function space. In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris. Our extensive experimental results show that for the first time an ADP algorithm, namely CBMPI, obtains the best results reported in the literature for Tetris in both small $10\times 10$ and large $10\times 20$ boards. Although the CBMPI's results are similar to those achieved by the CE method in the large board, CBMPI uses considerably fewer (almost 1/10) samples (call to the generative model of the game) than CE.

artificial intelligence, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

d7f426ccbc6db7e235c57958c21c5dfa-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 16:45:26 GMT

algorithm, discor, function approximation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Non-parametric Approximate Dynamic Programming via the Kernel Method

Neural Information Processing SystemsMar-14-2024, 11:52:40 GMT

This paper presents a novel non-parametric approximate dynamic programming (ADP) algorithm that enjoys graceful approximation and sample complexity guarantees. In particular, we establish both theoretically and computationally that our proposal can serve as a viable alternative to state-of-the-art parametric ADP algorithms, freeing the designer from carefully specifying an approximation architecture. We accomplish this by developing a kernel-based mathematical program for ADP. Via a computational study on a controlled queueing network, we show that our procedure is competitive with parametric ADP approaches.

approximation, approximation architecture, rsalp, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

Neural Information Processing SystemsMar-13-2024, 17:23:25 GMT

Tetris is a video game that has been widely used as a benchmark for various optimization techniques including approximate dynamic programming (ADP) algorithms. A look at the literature of this game shows that while ADP algorithms that have been (almost) entirely based on approximating the value function (value function based) have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results. This makes us conjecture that Tetris is a game in which good policies are easier to represent, and thus, learn than their corresponding value functions. So, in order to obtain a good performance with ADP, we should use ADP algorithms that search in a policy space, instead of the more traditional ones that search in a value function space. In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris. Our experimental results show that for the first time an ADP algorithm, namely CBMPI, obtains the best results reported in the literature for Tetris in both small 10 10 and large 10 20 boards. Although the CBMPI's results are similar to those of the CE method in the large board, CBMPI uses considerably fewer (almost 1/6) samples (calls to the generative model) than CE.

algorithm, tetris, value function, (17 more...)

Neural Information Processing Systems

Country: Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching

Luo, Junliang, Xu, Yi Tian, Wu, Di, Jenkin, Michael, Liu, Xue, Dudek, Gregory

arXiv.org Artificial IntelligenceOct-30-2023

Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.

base station, handover, power consumption, (15 more...)

arXiv.org Artificial Intelligence

2310.12999

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe (0.04)
Africa > Togo (0.04)

Genre: Research Report (0.40)

Industry: Telecommunications > Networks (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)

Add feedback

Does on-policy data collection fix errors in off-policy reinforcement learning?

AIHubMay-6-2020, 12:34:44 GMT

Reinforcement learning has seen a great deal of success in solving complex decision making problems ranging from robotics to games to supply chain management to recommender systems. Despite their success, deep reinforcement learning algorithms can be exceptionally difficult to use, due to unstable training, sensitivity to hyperparameters, and generally unpredictable and poorly understood convergence properties. Multiple explanations, and corresponding solutions, have been proposed for improving the stability of such methods, and we have seen good progress over the last few years on these algorithms. In this blog post, we will dive deep into analyzing a central and underexplored reason behind some of the problems with the class of deep RL algorithms based on dynamic programming, which encompass the popular DQN and soft actor-critic (SAC) algorithms – the detrimental connection between data distributions and learned models. Before diving deep into a description of this problem, let us quickly recap some of the main concepts in dynamic programming.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

AIHub

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Does on-policy data collection fix errors in off-policy reinforcement learning?

RobohubMar-19-2020, 00:48:17 GMT

Reinforcement learning has seen a great deal of success in solving complex decision making problems ranging from robotics to games to supply chain management to recommender systems. Despite their success, deep reinforcement learning algorithms can be exceptionally difficult to use, due to unstable training, sensitivity to hyperparameters, and generally unpredictable and poorly understood convergence properties. Multiple explanations, and corresponding solutions, have been proposed for improving the stability of such methods, and we have seen good progress over the last few years on these algorithms. In this blog post, we will dive deep into analyzing a central and underexplored reason behind some of the problems with the class of deep RL algorithms based on dynamic programming, which encompass the popular DQN and soft actor-critic (SAC) algorithms – the detrimental connection between data distributions and learned models. Before diving deep into a description of this problem, let us quickly recap some of the main concepts in dynamic programming.

algorithm, corrective feedback, q-function, (15 more...)

Robohub

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

Gabillon, Victor, Ghavamzadeh, Mohammad, Scherrer, Bruno

Neural Information Processing SystemsFeb-14-2020, 17:42:00 GMT

Tetris is a popular video game that has been widely used as a benchmark for various optimization techniques including approximate dynamic programming (ADP) algorithms. A close look at the literature of this game shows that while ADP algorithms, that have been (almost) entirely based on approximating the value function (value function based), have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results. This makes us conjecture that Tetris is a game in which good policies are easier to represent, and thus, learn than their corresponding value functions. So, in order to obtain a good performance with ADP, we should use ADP algorithms that search in a policy space, instead of the more traditional ones that search in a value function space. In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris.

adp algorithm, tetris, value function, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Add feedback

Approximate Dynamic Programming Finally Performs Well in the Game of Tetris

Gabillon, Victor, Ghavamzadeh, Mohammad, Scherrer, Bruno

Neural Information Processing SystemsDec-31-2013

Tetris is a popular video game that has been widely used as a benchmark for various optimization techniques including approximate dynamic programming (ADP) algorithms. A close look at the literature of this game shows that while ADP algorithms, that have been (almost) entirely based on approximating the value function (value function based), have performed poorly in Tetris, the methods that search directly in the space of policies by learning the policy parameters using an optimization black box, such as the cross entropy (CE) method, have achieved the best reported results. This makes us conjecture that Tetris is a game in which good policies are easier to represent, and thus, learn than their corresponding value functions. So, in order to obtain a good performance with ADP, we should use ADP algorithms that search in a policy space, instead of the more traditional ones that search in a value function space. In this paper, we put our conjecture to test by applying such an ADP algorithm, called classification-based modified policy iteration (CBMPI), to the game of Tetris. Our extensive experimental results show that for the first time an ADP algorithm, namely CBMPI, obtains the best results reported in the literature for Tetris in both small $10\times 10$ and large $10\times 20$ boards. Although the CBMPI's results are similar to those achieved by the CE method in the large board, CBMPI uses considerably fewer (almost 1/10) samples (call to the generative model of the game) than CE.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: